Sensor-based remote health monitoring is used in industrial, urban and healthcare settings to monitor ongoing operation of equipment and human health. An important aim is to intervene early if anomalous events or adverse health is detected. In the wild, these anomaly detection approaches are challenged by noise, label scarcity, high dimensionality, explainability and wide variability in operating environments. The Contextual Matrix Profile (CMP) is a configurable 2-dimensional version of the Matrix Profile (MP) that uses the distance matrix of all subsequences of a time series to discover patterns and anomalies. The CMP is shown to enhance the effectiveness of the MP and other SOTA methods at detecting, visualising and interpreting true anomalies in noisy real world data from different domains. It excels at zooming out and identifying temporal patterns at configurable time scales. However, the CMP does not address cross-sensor information, and cannot scale to high dimensional data. We propose a novel, self-supervised graph-based approach for temporal anomaly detection that works on context graphs generated from the CMP distance matrix. The learned graph embeddings encode the anomalous nature of a time context. In addition, we evaluate other graph outlier algorithms for the same task. Given our pipeline is modular, graph construction, generation of graph embeddings, and pattern recognition logic can all be chosen based on the specific pattern detection application. We verified the effectiveness of graph-based anomaly detection and compared it with the CMP and 3 state-of-the art methods on two real-world healthcare datasets with different anomalies. Our proposed method demonstrated better recall, alert rate and generalisability.
translated by 谷歌翻译
运动估计方法通常采用传感器融合技术(例如Kalman滤波器)来处理单个传感器故障。最近,已经提出了基于深度学习的融合方法,提高了性能并需要更少的模型特定实现。但是,当前的深融合方法通常认为传感器是同步的,这并不总是实用的,尤其是对于低成本硬件。为了解决这一局限性,在这项工作中,我们提出了AFT-VO,这是一种新型的基于变压器的传感器融合体系结构,以估算来自多个传感器的VO。我们的框架结合了异步多视觉摄像机的预测,并说明了来自不同来源的测量值的时间差异。我们的方法首先采用混合密度网络(MDN)来估计系统中每个相机的6-DOF姿势的概率分布。然后引入了一个新型的基于变压器的融合模块AFT-VO,该模块结合了这些异步姿势估计以及它们的信心。更具体地说,我们引入了离散器和源编码技术,该技术使多源异步信号的融合。我们在流行的Nuscenes和Kitti数据集上评估了我们的方法。我们的实验表明,用于VO估计的多视图融合提供了强大而准确的轨迹,在挑战性的天气和照明条件下都超过了艺术的表现。
translated by 谷歌翻译
在现代机器学习研究中,概括到以前看不见的任务的能力几乎是一个关键的挑战。它也是未来“将军AI”的基石。任何部署在现实世界应用中的人为智能代理,都必须随时适应未知环境。研究人员通常依靠强化和模仿学习来通过试用和错误学习来在线适应新任务。但是,这对于需要许多时间段或大量子任务才能完成的复杂任务可能具有挑战性。这些“长范围”任务遭受了样本效率低下的损失,并且可能需要非常长的培训时间,然后代理人才能学习执行必要的长期计划。在这项工作中,我们介绍了案例,该案例试图通过使用自适应“不久的将来”子目标训练模仿学习代理来解决这些问题。这些子观念在每个步骤中使用构图算术在学习潜在的表示空间中进行重新计算。除了提高标准长期任务的学习效率外,这种方法还可以使对以前看不见的任务进行一次性的概括,只有在不同环境中为该任务进行单个参考轨迹。我们的实验表明,所提出的方法始终优于先前的最新成分模仿学习方法30%。
translated by 谷歌翻译
在这项工作中,我们介绍了一种新的观点,用于在多任务模仿学习中学习可转移的内容。人类能够转移技能和知识。如果我们可以骑自行车工作并开车去商店,我们还可以骑自行车去商店并开车去上班。我们从中汲取灵感,假设策略网络的潜在记忆可以分为两个分区。这些要么包含有关任务的环境环境的知识,要么包含解决任务所需的可推广技能。这可以提高培训效率,并更好地概括相同环境中的技能和在看不见的环境中的同一任务。我们使用了建议的方法来训练两个不同的多任务IL环境的分解代理。在这两种情况下,我们的任务成功率都超过了SOTA的30%。我们还向真正的机器人进行导航证明了这一点。
translated by 谷歌翻译
视觉内径(VO)估计是车辆状态估计和自主驾驶的重要信息来源。最近,基于深度学习的方法已经开始出现在文献中。但是,在驾驶的背景下,由于环境因素,摄像机放置等因素而导致的图像质量降低,单个传感器的方法通常容易出现故障。要解决这个问题,我们提出了一个深度传感器融合框架,其使用两者估计车辆运动来自多个板上摄像头的姿势和不确定性估计。我们使用混合CNN - RNN模型从一组连续图像中提取短时间形特征表示。然后,我们利用混合密度网络(MDN)来估计作为分布的混合和融合模块的6-DOF姿势,以使用来自多摄像机的MDN输出来估计最终姿势。我们在公开的大规模自动车辆数据集,Nuscenes上评估我们的方法。结果表明,与基于相机的估计相比,所提出的融合方法超越了最先进的,并提供了坚固的估计和准确的轨迹。
translated by 谷歌翻译
可视化内径(VO)用于许多应用,包括机器人和自主系统。但是,基于特征匹配的传统方法是计算昂贵的,而不是直接解决故障情况,而是依赖于启发式方法来检测失败。在这项工作中,我们提出了一种基于深度学习的VO模型,以有效地估计6 DOF姿势,以及这些估计的置信模型。我们利用CNN - RNN混合模型从图像序列学习特征表示。然后,我们采用混合密度网络(MDN),其估计相机运动作为高斯的混合,基于提取的时空表示。我们的模型使用姿势标签作为监督源,但以无人监督的方式源性不确定性。我们评估基提和NUSCENES数据集的提出模型,并报告广泛的定量和定性结果,以分析姿势和不确定性估计的性能。我们的实验表明,除了使用预测的姿态不确定性检测故障情况之外,该建议的模型还超过了最先进的性能。
translated by 谷歌翻译
In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment the objective with a regularizer to address challenges associated with ill-posedness. The choice of a suitable regularizer is typically driven by prior domain information and computational considerations. Convex regularizers are attractive as they are endowed with certificates of optimality as well as the toolkit of convex analysis, but exhibit a computational scaling that makes them ill-suited beyond moderate-sized problem instances. On the other hand, nonconvex regularizers can often be deployed at scale, but do not enjoy the certification properties associated with convex regularizers. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what are the optimal regularizers, both convex and nonconvex, for data drawn from the distribution? What properties of a data source govern whether it is amenable to convex regularization? We address these questions for the class of continuous and positively homogenous regularizers for which convex and nonconvex regularizers correspond, respectively, to convex bodies and star bodies. By leveraging dual Brunn-Minkowski theory, we show that a radial function derived from a data distribution is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization. Using tools such as $\Gamma$-convergence, we show that our results are robust in the sense that the optimal regularizers for a sample drawn from a distribution converge to their population counterparts as the sample size grows large. Finally, we give generalization guarantees that recover previous results for polyhedral regularizers (i.e., dictionary learning) and lead to new ones for semidefinite regularizers.
translated by 谷歌翻译
Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebImageText, and LAION have propelled recent dramatic progress in AI. Large neural models trained on such datasets produce impressive results and top many of today's benchmarks. A notable omission within this family of large-scale datasets is 3D data. Despite considerable interest and potential applications in 3D vision, datasets of high-fidelity 3D models continue to be mid-sized with limited diversity of object categories. Addressing this gap, we present Objaverse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive captions, tags, and animations. Objaverse improves upon present day 3D repositories in terms of scale, number of categories, and in the visual diversity of instances within a category. We demonstrate the large potential of Objaverse via four diverse applications: training generative 3D models, improving tail category segmentation on the LVIS benchmark, training open-vocabulary object-navigation models for Embodied AI, and creating a new benchmark for robustness analysis of vision models. Objaverse can open new directions for research and enable new applications across the field of AI.
translated by 谷歌翻译
System identification, also known as learning forward models, transfer functions, system dynamics, etc., has a long tradition both in science and engineering in different fields. Particularly, it is a recurring theme in Reinforcement Learning research, where forward models approximate the state transition function of a Markov Decision Process by learning a mapping function from current state and action to the next state. This problem is commonly defined as a Supervised Learning problem in a direct way. This common approach faces several difficulties due to the inherent complexities of the dynamics to learn, for example, delayed effects, high non-linearity, non-stationarity, partial observability and, more important, error accumulation when using bootstrapped predictions (predictions based on past predictions), over large time horizons. Here we explore the use of Reinforcement Learning in this problem. We elaborate on why and how this problem fits naturally and sound as a Reinforcement Learning problem, and present some experimental results that demonstrate RL is a promising technique to solve these kind of problems.
translated by 谷歌翻译
Background: Encouraged by the success of pretrained Transformer models in many natural language processing tasks, their use for International Classification of Diseases (ICD) coding tasks is now actively being explored. In this study, we investigate three types of Transformer-based models, aiming to address the extreme label set and long text classification challenges that are posed by automated ICD coding tasks. Methods: The Transformer-based model PLM-ICD achieved the current state-of-the-art (SOTA) performance on the ICD coding benchmark dataset MIMIC-III. It was chosen as our baseline model to be further optimised. XR-Transformer, the new SOTA model in the general extreme multi-label text classification domain, and XR-LAT, a novel adaptation of the XR-Transformer model, were also trained on the MIMIC-III dataset. XR-LAT is a recursively trained model chain on a predefined hierarchical code tree with label-wise attention, knowledge transferring and dynamic negative sampling mechanisms. Results: Our optimised PLM-ICD model, which was trained with longer total and chunk sequence lengths, significantly outperformed the current SOTA PLM-ICD model, and achieved the highest micro-F1 score of 60.8%. The XR-Transformer model, although SOTA in the general domain, did not perform well across all metrics. The best XR-LAT based model obtained results that were competitive with the current SOTA PLM-ICD model, including improving the macro-AUC by 2.1%. Conclusion: Our optimised PLM-ICD model is the new SOTA model for automated ICD coding on the MIMIC-III dataset, while our novel XR-LAT model performs competitively with the previous SOTA PLM-ICD model.
translated by 谷歌翻译